Is It Dangerous to Use Version Control Histories to Study Source Code Evolution?
نویسندگان
چکیده
Researchers use file-based Version Control System (VCS) as the primary source of code evolution data. VCSs are widely used by developers, thus, researchers get easy access to historical data of many projects. Although it is convenient, research based on VCS data is incomplete and imprecise. Moreover, answering questions that correlate code changes with other activities (e.g., test runs, refactoring) is impossible. Our tool, CodingTracker, non-intrusively records fine-grained and diverse data during code development. CodingTracker collected data from 24 developers: 1,652 hours of development, 23,002 committed files, and 314,085 testcase runs. This allows us to answer: How much code evolution data is not stored in VCS? How much do developers intersperse refactorings and edits in the same commit? How frequently do developers fix failing tests by changing the test itself? How many changes are committed to VCS without being tested? What is the temporal and spacial locality of changes?
منابع مشابه
Automatic Generation of Version Control Systems
We describe Bamboo, a system capable of generating working version control systems. Bamboo takes as input a specification of a version control system’s data model expressed using containment modeling, the pattern used to represent version histories, and choices concerning fine-grain version control behavior. Output is generated C language source code for a working version control system and tex...
متن کاملDetecting Change Patterns in Aspect Oriented Software Evolution: Rule-based Repository Analysis
Interesting information and Meta-information about software systems can be extracted by analyzing their evolution histories. This information has been proved useful for understanding software evolution, predicting future changes, and performing an efficient change impact analysis. A rich source code repository is a prerequisite for a high quality evolution analysis. Nonetheless, the evolutionar...
متن کاملUsing Accounting Information in Decision Making of Hospitals Managers
Decision making process requires information. Accounting is the most important source of information. In 1998, the international federation of accountants issued a statement about the scope and using of accounting. It identified 4 stages for using accounting information: cost determination, planning and financial control, reduction of resources waste and creation the value. This study was desig...
متن کاملCVS Data Extraction and Analysis: A Case Study
Version control repositories contain a wealth of detailed information about the evolution of a codebase. In this paper, we outline our experiences parsing and analyzing data from a large collection of CVS repositories created by many students working on a small set of assignments in a second year undergraduate computer science course. We believe the data to be quite unique because rather than a...
متن کاملRefactoring-Aware Version Control Towards Refactoring Support in API Evolution and Team Development
Today, refactorings are supported in some integrated development environments (IDEs). The refactoring operations can only work correctly if all source code that needs to be changed is available to the IDE. However, this precondition neither holds for application programming interface (API) evolution, nor in team development. The research presented in this paper aims to support refactoring in AP...
متن کامل